Library Imports
from pyspark.sql import SparkSession
from pyspark.sql import types as T
from pyspark.sql import functions as F
from datetime import datetime
from decimal import Decimal
Template
spark = (
    SparkSession.builder
    .master("local")
    .appName("Section 2.3 - Creating New Columns")
    .config("spark.some.config.option", "some-value")
    .getOrCreate()
)
sc = spark.sparkContext
import os
data_path = "/data/pets.csv"
base_path = os.path.dirname(os.getcwd())
path = base_path + data_path
pets = spark.read.csv(path, header=True)
pets.toPandas()
| id | breed_id | nickname | birthday | age | color | |
|---|---|---|---|---|---|---|
| 0 | 1 | 1 | King | 2014-11-22 12:30:31 | 5 | brown | 
| 1 | 2 | 3 | Argus | 2016-11-22 10:05:10 | 10 | None | 
| 2 | 3 | 1 | Chewie | 2016-11-22 10:05:10 | 15 | None | 
Creating New Columns and Transforming Data
When we are data wrangling, transforming data, we will using assign the result to a new column. We will explore the withColumn() function and other transformation functions to  achieve this our end results.
We will also look into how we can rename a column with withColumnRenamed(), this is useful for making a join on the same column, etc. 
Case 1: New Columns - withColumn()
(
    pets
    .withColumn('nickname_copy', F.col('nickname'))
    .withColumn('nickname_capitalized', F.upper(F.col('nickname')))
    .toPandas()
)
| id | breed_id | nickname | birthday | age | color | nickname_copy | nickname_capitalized | |
|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | King | 2014-11-22 12:30:31 | 5 | brown | King | KING | 
| 1 | 2 | 3 | Argus | 2016-11-22 10:05:10 | 10 | None | Argus | ARGUS | 
| 2 | 3 | 1 | Chewie | 2016-11-22 10:05:10 | 15 | None | Chewie | CHEWIE | 
What Happened?
We duplicated the nickname column as nickname_copy using the withColumn() function. We also created a new column where all the letters of the nickname are capitalized with chaining multiple spark functions together.
We will look into more advanced column creation in the next section. There we will go into more details what a column expression is and what the purpose of F.col() is.
Case 2: Renaming Columns - withColumnRenamed()
(
    pets
    .withColumnRenamed('id', 'pet_id')
    .toPandas()
)
| pet_id | breed_id | nickname | birthday | age | color | |
|---|---|---|---|---|---|---|
| 0 | 1 | 1 | King | 2014-11-22 12:30:31 | 5 | brown | 
| 1 | 2 | 3 | Argus | 2016-11-22 10:05:10 | 10 | None | 
| 2 | 3 | 1 | Chewie | 2016-11-22 10:05:10 | 15 | None | 
What Happened?
We renamed and replaced the id column with pet_id.
Summary
- We learned how to create new columns from old ones by chaining spark functions and using withColumn().
- We learned how to rename columns using withColumnRenamed().